Efficient Representation and Parallel Computation of String-Substring Longest Common Subsequences

نویسنده

  • Alexander Tiskin
چکیده

Given two strings a, b of length m, n respectively, the string-substring longest common subsequence (SS-LCS) problem consists in computing the length of the longest common subsequence of a and every substring of b. An explicit representation of the output lengths is of size Θ(n). We show that the output can be represented implicitly by a set of n two-dimensional integer points, where individual output lengths are obtained by dominance counting queries. This leads to a data structure of size O(n), which allows to query an individual output length in time O( logn log logn ), using a recent result by JaJa, Mortensen and Shi. The currently best sequential SS-LCS algorithm by Alves et al. can be adapted to produce the output in the above geometric representation. We also develop a new parallel SS-LCS algorithm that runs on a p-processor coarse-grained computer in O( p ) local computation, O(n log p) communication, O(log p) barrier synchronisations, and O(n) memory per processor, producing the output in the above geometric representation. Compared to previously known results, our approach presents a substantial improvement in algorithm functionality, output representation efficiency, communication efficiency and/or memory efficiency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Family of String Classifiers Based on Local Relatedness

This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr’s), longest common subsequences (LCSeq’s), and window-accumulated longest common subsequences (wLCSeq’s). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set)...

متن کامل

Semi-local longest common subsequences in subquadratic time

For two strings a, b of lengths m, n respectively, the longest common subsequence (LCS) problem consists in comparing a and b by computing the length of their LCS. In this paper, we define a generalisation, called “the all semi-local LCS problem”, where each string is compared against all substrings of the other string, and all prefixes of each string are compared against all suffixes of the ot...

متن کامل

All Semi-local Longest Common Subsequences in Subquadratic Time

For two strings a, b of lengths m, n respectively, the longest common subsequence (LCS) problem consists in comparing a and b by computing the length of their LCS. In this paper, we define a generalisation, called “the all semi-local LCS problem”, where each string is compared against all substrings of the other string, and all prefixes of each string are compared against all suffixes of the ot...

متن کامل

Approximate string matching as an algebraic computation

Approximate string matching has a long history and employs a wide variety of methods (see e.g. the survey [2]). We consider a variant of approximate matching that compares a fixed pattern string to every substring in the text string by a rational-weighted edit distance (e.g. the indel distance, defined as the number of character insertions and deletions, or the indelsub/Levenshtein distance, wh...

متن کامل

A BSP/CGM Algorithm for the All-Substrings Longest Common Subsequence Problem

Given two strings X and Y of lengths m and n, respectively, the all-substrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y . The sequential algorithm takes O(mn) time and O(n) space. We present a parallel algorithm for ALCS on a coarse-grained multicomputer (BSP/CGM) model with p < p m processors that takes O(mn=p) time an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005